Entry Name:  "UBA-Rukavina-Group51-MC3"

VAST Challenge 2014
Mini-Challenge 3

 

 

Team Members:

Andrei Rukavina, Universidad de Buenos Aires, rukavina.andrei@gmail.com  PRIMARY

Mariana Landoni, Universidad de Buenos Aires, mariana.landoni@gmail.com

Paulina Verasay, Universidad de Buenos Aires, pauliverasay@gmail.com

Maria Traverso, Universidad de Buenos Aires, lautraverso@gmail.com

 

Student Team:  YES

 

Team Number: 51

 

Streaming User ID: rukavina.andrei@gmail.com

 

Analytic Tools Used:

Tableau 8.1

Qlikview 11

Access (2010)

Excel (2013)

Python 2.7.3

Wordle.net (http://www.wordle.net/)

RAW (http://raw.densitydesign.org/)

Camtasia Studio 8

 

Approximately how many hours were spent working on this submission in total?

200 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES

 

 

Video:

https://www.flickr.com/photos/124558678@N06/14587539366/

UBA_Rukavina_MC3

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

Please note - this challenge contains a question that is time-dependent.  Within 3 hours of starting the final data stream, send an email to VASTChal2014MC3@vacommunity.org containing your answer to question MC3.1.  Please include a copy of your answer to MC3.1 in your final answer form also. Your answers to MC3.2 and MC3.3, along with your video, are due July 8.

 

The responses to these questions should be incrementally built, as you (the contestant) acquire information from each streaming data segment you receive.  Your submission will answer these questions in consideration of all of the streaming data segments.

 

 

MC3.1 - Within 3 hours after start the final data stream, send an email to VASTChal2014MC3@vacommunity.org containing:

a.       An image showing the streaming data in your visual analytics tool. In this image, identify an event of interest that you intend to investigate further.

b.      The content of the final message in the data stream

 

We used “Qlikview” to display the time series of streaming records.  The number of messages per second on 23 January 2014 as a function of time is shown as a red line for Microblog Messages (mbdata) and a blue line for Call Center Data (ccdata). A peak in the number of mbdata and ccdata messages received occurs at 20:10. At this time, a shooting is taking place and Police apparently are beginning negotiations with terrorists.

 

 

 

The content of the last message is:

 

mbdata|RT @KronosStar There has been an explosion from inside the apartment building.  Several people are down. #KronosStar #DancingDolphinFire #AFDHeroes|20140123213200|0|gardener4958|RT @KronosStar There has been an explosion from inside the apartment building.  Several people are down. #KronosStar #DancingDolphinFire #AFDHeroes|||

 

This content was modified using Java to generate something like:

 

line = Joiner.on("|").useForNull("[N/A]").join(Arrays.asList(msgType, content,

dateTime,i, author, content, lat, longitude,location));

 

 

MC 3.2 - Describe the timeline of up to five major events that you discover in the streaming data. This timeline should include information from all three segments of the data stream if needed.    Use specific microblog records and call center data to support your description, but do not simply mimic back the data stream.  Provide a concise description of important participants, locations and durations.  Focus your response on the events themselves, rather than on the individuals reporting the events. Please limit your answer to no more than ten images and 1500 words.

 

In order to identify and exclude spam and junk messages, we analyzed the number of messages by author. We counted the number of messages by each user. For messages for the same user, we counted how many contained the exact same text. To quantify text repetitions, we built a spam index by dividing the number of messages with different content sent by a user over the total number of messages sent by that user. A low spam index value indicates many messages with the same text and thus is associated with a high likelihood of spam.

 

 

As shown in the bar graph above, KronosQuoth and Clevvah4Evah sent large numbers of messages (1,265 and 153, respectively), but their spam index is very low. We found that the contents of these messages were not relevant to the questions asked, thus we considered them spam or junk and excluded them from further analyses.

 

Timeline of Events

We detected four distinct events by examining the pattern of repeating hashtags within short periods:

·         Rally of the Protectors of Kronos (POK) in Abila City Park;

·         Fire at the Dancing Dolphin apartment complex;

·         Black van hit and run;

·         Shooting at Gelato Galore ice cream parlor.

 

Event 1. The first event took place on 23 January 2014 starting at 17:00. This event was a rally in Abila City Park organized by the Protectors of Kronos (POK). The park is bound by Pilau St. on the west, Parla St. on the east, Achilleos St. and Ermou St. on the north and Egeou St. on the south.

The Protectors of Kronos (POK) is a political activist movement that was started in 1997 as a small group of seven citizens concerned about contamination from drilling at the Tiskele Bend gas fields in Kronos. One of its charismatic leaders was Elian Karel, who died on 19 June 2009 – apparently from a heart attack – after being held in prison for three months.

The rally hostess is Sylvia Marek, one of the leaders of POK. She is also the leader and co-founder of Save Our Wildlands (SOW), a small environmental activist group associated with POK. Special guests at the rally include (a) Dr. Audrey McConnel Newman, internationally renowned environmental scientist from the United States, (b) Lucio Jakab, cofounder of SOW with Sylvia Marek, and (c) Professor Lorenzo Di Stefano, who teaches Environmental Science at the University of Abila. The band Victor-E is playing at the rally.  A timeline of events occurring during the rally is shown in the table below. In all events, the different developments and their times are extracted from the set of Twitter and Call Center data.

 

Event 2. The second event started at 18:25 and involved a fire at the Dancing Dolphin apartment complex, located at the corner of N Achilleos St. and N Madeg St. A timeline of events occurring during the fire is shown in the table below.

 

Event 3. The third event involves two consecutive hit and run incidents starting at 19:19. The driver of a black van hits a car first and a cyclist afterwards. A timeline of developments in this event is shown in the table below.

Event 4. The fourth event is a shooting, starting at 19:39, in the parking lot of “Gelato Galore”, an ice cream parlor in the corner of N Alexandrias St. and N Ithakis St., near Abila City Park.  A timeline of events occurring during the shooting is shown in the table below.

 

The next graph shows the starting time and approximate duration of the four detected events. Data considered: hashtags from mbdata and all ccdata. Hashtags that were considered unrelated to the four events are not displayed.

TimeLine_v1

 

We used the hashtags in the messages to identify the events starting and approximate ending times. The first event, POK Rally, starts at 17:00 and ends around 20:15. Hashtags such as POKRally, Rally, POK, and POKrallyinthepark are detected.

The second event, Fire in Dancing Dolphin apartment complex, starts around 18:40. Twits with hashtags such as Fire, DancingDolphinFire, AFD (Abila Fire Department), AFDheroes are discovered.

The third event, Black van hit and run, starts at 19:20; ccdata like “All Units Broadcast Felony Hit and Run – in progress” and the hashtags Jerkdrivers and APD (Abila Police Department) are used.

The last event, Shooting at Gelato Galore, starts at 19:40; the hashtags Troubleatgelato, TAG, Gelatogalorestandoff, Blackvan and Hostage are identified during this event.

 

The Stream graph bellow complements the previous visualization.

 

Streamgraph_v2

 

For a better visualization we grouped hashtags and ccdata that are related. The groups are defined as follows:

·         Abila: Abila, Abilacitypark, Abilafinest, Abilajobs, Abilaparadise, Abilaprays, Abilafinest, Abilawatcher

·         AFD: AFD, AFDheroes, AFDheros

·         APD: APD

·         Fire: DancingolphinFire, Dancingdolfinfire, Dancingdolphin, Dancingdolphinsfire, Dancingdophinsfire, Dancingfire, Dansingdolfinfire, Dansingdolphinfire

·         Gelato: Troubleatgelato, TAG, Standoffover, Standoff, Shooting, GG, Gelatogalorestandofff, Gelatogalore

·         News Media: HI, KronosStar, AbilaPost, IntNews, NewsOnline, CentralBulletin

·         POK Rally: POKRally, Rally, POKRallyinthepark, POKliesinthepark, Park, Parkcheck, Rallypark

·         Van Accident: Jerkdrivers, Pursuit Continues, Pursuit, Suspicious Occupied Vehicle-Black Van, Vehicle Accident-Report

 

We can visualize that the hashtags related to the POK Rally event are used between 17:00 and 19:15. Hashtags related to Abila and News Media are used the whole time. The APD hashtag has periods during which it is not used. The hashtags grouped in Fire and AFD start around 18:40. The hashtags used in Van Accident are used between 19:10 and 19:40. And hashtags grouped in Gelato are used between 19:30 and 21:00.

 

Following visualization is a mood analysis graph, subjectivity (grey line) and polarity feelings (green and red bars) are displayed.

Mood Analysis

The grey peaks show high subjectivity in the messages, the highest peak takes place when the shooting at Gelato Galore starts. The other peaks occur during the speeches at the rally, when the fire start at the Dancing Dolphin apartments and during the hit and run incident.

Over all, the messages show positive feelings. The strongest negative feelings appear when the fire and shooting start, consistently with the fear expressed in messages.

 

 

MC 3.3 – Select one of your five major events from question MC 3.2 that you consider to be most likely to provide additional clues to the investigation of the GASTech disappearances.   Describe the roles of the participants.  Describe how other events you identified in MC3.2 may have influenced your selected event. Provide a hypothesis and evidence as to whom you suspect as being directly involved in the GAStech disappearances, either as perpetrators or victims.  Please limit your response to no more than five images and 500 words.

 

We consider the shooting at Gelato Galore as the event that can provide additional clues to the investigation of the GASTech disappearances. For this reason we decided to assess the messages that were sent between 19:30 and 21:32.

To identify the participants in this event we prepared a tree map by author, in the visualization the importance of their participation is displayed.

 

 

The authors that sent more messages during the event are:

 

We decided to build a “word cloud” based on text from messages sent during the event by the previous 10 authors.

 

Shotting_event-word_cloud_by_message_author.jpg

 

We can detect more participants based on the previous image:

Police: They blocked the suspicious black van at Gelato Galore. They engage in a firefight with the van’s driver, a policeman is wounded.  They resist the shooting until the SWAT team arrives. When SWAT arrives, they start evacuating “Carly’s Coffee” and “General Grocer”.

SWAT team: They arrived at scene at Gelato Galore. They negotiate with the terrorists and finally they release the hostages when the terrorists surrender.

Terrorist: he shoots a policeman, he seems out of his mind. He shouts during the negotiations and threatens to kill a hostage.

Hostages: Nobody can see them. Two women hostages were rescued and safe but no names were released.

 

The shooting event is related to the Van hit and run event because the same black van is involved.  The van driver after hitting a car and running over a cyclist escapes from the scene.  Being chased by the police, he enters a parking lot with no exit, feeling trapped he starts shooting at the police that was blocking his way. 

In the next Radar graph we can visualize the connection between both events.

 

MC3.3 Roles

 

We can conclude that two of the disappeared GASTech employees (two women), were the ones inside the black van.  They were rescued because their captors hit a car and run over a cyclist, and that event set off the police chase to the black van.

We consider as a suspect the author with username Officia1AbilaPost, first because this name can be mistaken as an “Official Abila Post”, only switching the 1 for an l, and second because this author sends messages with misinformation regarding the disappeared employees, trying to divert attention.  We found this author because s/he used the hashtag #GASTech.

Considering information from Challenge 1, we think that one of the hostages is Rachel Pantanal, Executive Assistant of GASTech CIO.  And one of the kidnappers is Isia Vann, the brother of Juliana Vann.

We arrived to this conclusion when we searched the twits trying to find out if someone mention a GASTech employee.  A message said that s/he had not heard for several days from Rachel.

We also found in Challenge 1, emails from Isia Vann that we considered are harassment towards Rachel.